Chris Pollett > Old Classes >
CS267

( Print View )

Student Corner:
  [Grades Sec1]

  [Submit Sec1]

  [Class Sign Up Sec1]

  [
Lecture Notes]
  [Discussion Board]

Course Info:
  [Texts & Links]
  [Topics/Outcomes]
  [Outcomes Matrix]
  [Grading]
  [HW/Quiz Info]
  [Exam Info]
  [Regrades]
  [Honesty]
  [Additional Policies]
  [Announcements]

HW Assignments:
  [Hw1]  [Hw2]  [Hw3]
  [Hw4]  [Quizzes]

Practice Exams:
  [Mid 1]  [Mid 2]  [Final]

                           












HW#3 --- last modified February 10 2019 22:00:19..

Solution set.

Due date: Nov 2

Files to be submitted:
  Hw3.zip

Purpose: Learn how to analyze some of our inverted index algorithms. Code BM25. Use trec_eval software.

Related Course Outcomes:

The main course outcomes covered by this assignment are:

LO3 -- Be able to explain where BM25, BM25F and difference from randomness statistics come from.

LO5 -- Demonstrate with small examples how incremental index updates can be done with log merging.

LO6 -- Be able to evaluate search results by hand and using TREC eval software.

Specification: This homework consists of three parts: (a) book problems 4.1, 4.3, 5.2, 5.5, 6.2 which you should put in a file Problems.pdf inside your zip file. (b) a coding portion, and an (c) evaluation portion.

For the coding portion of the assignment I want you to extend the program you wrote for HW2 so that now it takes as a possible third argument bm25. You can either start with your own code or extend the HW2 solution. If bm25 is selected as the ranking method, then it should calculate the bm25 scores of all the documents which contain the query terms and output the results in decreasing order of BM25 relevance. The output format should be modified from HW2 so that it corresponds to a trec_eval results file.

For the last part of the homework I want you to download and compile the trec_eval program from NIST. I want you to create a text_qrels_file.txt by hand based for your corpus of HW2. Then I want you to carry out some experiments to compare using trec_eval the results returned by cosine rank versus those returned by BM25. Your zip file should contain all the text files you generated for your experiments and it should contain Experiments.txt which explains what you did and summarizes the results you got. It should also contain actual trec_eval output.

Point Breakdown

Book problems (1pt each graded on scale 0, 1/2 partial, 1 completely correct) 5pts
BM25 extension of HW2 code works as describes (1pt point coding guidelines, and code elegance; 2pts program does ) 3pts
File such as qrel and results files are contained in the folder and are in the correct format.1pt
Experiments.txt shows that a reasonable trec_eval experiment was conducted.1pt
Total10pts